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A neural network is a machine learning program, or model, 
that makes decisions in a manner similar to the human brain, 
by using processes that mimic the way biological neurons 


work together to identify phenomena, weigh options and 
arrive at conclusions. 
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“Cost dd = = I Å 
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Function” 


A mathematical function whose 
minimization is deemed to coincide 
with desired “best” model 
parameters/weights. 
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“Cost 


Function” 


A mathematical function whose 
minimization is deemed to coincide 
with desired “best” model 
parameters/weights. 


The weights are updated through a 
process of minimising a cost function 


Error or Cost 
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the update lowers the cost value. 
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FACLIVGL 
on 


Function” 


Activation 


Activation Function helps the neural network 
use important information while suppressing 
irrelevant data points (i.e., allows local 
“gating” of information). 


Output 
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Training - Perceptron vs. Adaline vs. Modern 
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The weights are updated through a 


process of minimising a cost function P erce pt ro 
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Training - Perceptron vs. Adaline vs. Modern 
Neuron 


The weights are updated through a 
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In fact... 


- A modern neuron ts hardly ever used alone! 


- Instead, it is used as a simple unit in a network of 
neurons aptly called “Neural Network” 


“Single-Layer 
Neural Network” 
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“Single-Layer Neurons in 
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“Multi-Layer Neurons in 
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“Depth vs 
Width” 
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“Depth vs 
Width” 
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“Deep Neural 
Network” 


“Deep Neural 
Network” 


Lots of intermediate 
layers! 


“Deep Neural 
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“Deep Learning” 


44 Dee D Lea aa N Måchine Learning paradigm where raw 


ut is fed to a deep learning algorithm 
which automatically decides what aspects 
(features) are important for task at hand. 
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“Dense vs Sparse 
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Two Broad Categories 
of NN 


Feedforward neural network 


Input Hidden Output 
/ Layer Layer Layer 
All connection "forward" 
(data passed through nodes 
only once) 


ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt 
@ GIKI - FES 


49 


Two Broad Categories 
of N N At least some connections “recurrent” 


(Some values memorized and used more 


than once) 
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“Convolutional Neural 


N etwo rK S learning alternative to conventional image 


processing! 
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To Understand CNNs, We Need to 
DISCUSS... 

e Convolution 

e Conventional Image Processing 


“Convolution” 
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“Convolution” 


4 convoluted 


/ konve I(j)u:trd/ 
adjective 


1. (especially of an argument, story, or sentence) extremely complex and difficult to follow. 
"the film is let down by a convoluted plot in which nothing really happens" 


Similar: | complicated complex involved intricate elaborate impenetrable 
2. TECHNICAL 


intricately folded, twisted, or coiled. 
“walnfits come in hard and convoluted shells" 
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“Convolution” 


A mathematical operation (just as 
) that helps us calculate the 
response of a special type of 

systems. 


Convolution is often 
denoted by 


Mathematically... 
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denoted by 
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Application... 
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Application... 
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Application... 
J \ 72 

LTI > 

gl — ung +øm= 3) Flø m), 


Typically, when an input, say passes through å 
linear time-invariant (LTI) operator (aka, 


) with associated function, say , 
the output can be calculated by a special 
mathematical operation called “convolution”. 


Numerically... 
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Convolution in 2D... 


Conventional Image Processing: 
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Conventional Image Processing: 
- Use a variety of pre-defined 


filters/kernels to extract features 
Use these features to 
identify/classify image 
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Conventional Image Processing: 


Use a variety of pre-defined Filters operate on images through 
filters/kernels to extract features “convolution”. 


Use these features to 
identify/classify image 


Feature Maps 
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Features 
— —> Extraction 


Filters operate on images through 
“convolution”. 
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Q. What if we do not know which filter/kernel is best for our images? 


Q. In fact, what if we let a Neural Network make its own filters that are best 
suited for the images at hand which may not be restricted to filters available in 


conventional image processing? 


Feature 
Extraction 


NN Learns/Chooses its 
Own “Features” 
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Q. What if we do not know which filter/kernel is best for our images? 


Q. In fact, what if we let a Neural Network make its own filters that are best 
suited for the images at hand which may not be restricted to filters available in 


conventional image processing? 
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Q. What if we do not know which filter/kernel is best for our images? 


Q. In fact, what if we let a Neural Network make its own filters that are best 
suited for the images at hand which may not be restricted to filters available in 
conventional image processing? 


Edge Shape-Based Texture 
Detection Feature Analysis 


Color & Intensity Feature Transform-Based 
Features Extraction Features 
Blob Local Feature Corner 
Detection Descriptors Detection 
Could we let an N 


choose/learn 
NN Learns/Chooses its “best” values for 
HR the filter? | | 
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This could lead to one problem 
though... 


(“features” may not have physical 
meaning) 


li 4 
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All fight, then. Kéep your secrets. 
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“Convolutional Neural Network” 


“Convolutional Neural Network” 


A Deep NN that has a convolutional part (for automatic 
feature extraction) followed by a fully-connected standard 
part (for classification/identification), and in which both the 
kernels (first part) and weights (second part) are learned. 
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Convolutional Neural Network” 


À Deep NN that has a convolutional part (for automatic 
feature extraction) followed by a fully-connected standard 
part (for classification/identification), and in which both the 
kernels (first part) and weights (second part) are learned. 


A Typical Convolutional Neural Network (CNN 


Convolution Pooling Convolution Pooling 
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Convolutions ~ Full 
Subsampling Convolutions Subsampling connection 


Kernel 
(filter) 


Image patch 
(Local receptive field) 


In the convolutional part, we let the NN learn several 
different kernels in multiple stages while also reducing 
dimensions to save computation time. 
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CNN in Action 


r ~ 


Feature Maps 


Btw... CNNs are Feedforward Neural 


Networks 
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Feedforward neural network Recurrent neural network 
Input Hidden Output Input Hidden Output 
Layer Layer Layer Layer Layer Layer 
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reCulL verb 


re:cur ( ri- kor) 
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recur verb 


re-cur ( ri- kor) 


recurred; recurring 


Synonyms of recur > 


intransitive verb 


1 :to have recourse : RESORT 


re- 
again LATIN 

ames recurrere 
LATIN 


CuTere 
UN 


2 :togo back in thought or discourse 
on recurring to my letters of that date 


- Thomas Jefferson 


3 a : to come up again for consideration 
Em} b : to come again to mind 


4 :to occur again after an interval : occur time after time 


the cancer recurred 
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“Recurrent Neural 
Networks” 
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comes before/after. 
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“Recurrent Neural 
Networks” 
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- Speech & Music 

- Any Time Series (daily 
weather) 


Tom IS a cat. Tom's favorite food is 
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helps to remember/know what 
comes before/after. 


Such as 


- Words in a sentence 

- Speech & Music 

- Any Time Series (daily 
weather) 


Tom IS a cat. Tom's favorite food is 
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“Recurrent Neural 
Networks” 


When things occur in sequence, it 


helps to remember/know what 
comes before/after. 


Such as 


- Words in a sentence 

- Speech & Music 

- Any Time Series (daily 
weather) 


Tom IS a cat. Tom's favorite food is 


s 
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To have a better chance 

at predicting correctly, it 
helps to remember that 
Tom is a cat! 


Nihari? 
Milk? 
Fish? 


“Recurrent Neural 
Networks” 


Recurrent neural network 


Input Hidden Output 
Layer Layer Layer 


At least some connections “recurrent” 


When things occur in sequence, it 
helps to remember/know what 
comes before/after. 


Such as 

- Words in a sentence 

- Speech & Music 

- Any Time Series (daily 
weather) 


Tom IS a cat. Tom's favorite food is 


Nihari? 
Milk? 
To have a better chance Fish? 
at predicting correctly, it 
helps to remember that 
Tom is a cat! 


(some values memorized and used more 


than once) 


How Is “Recurrence” 


Achieved? 
Feedforward Output at 
Neural independent 
Network of previous 
outputs 
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How Is “Recurrence” 


Achieved? 


Feedforward 
Neural 
Network 


Neuron ina 
Recurrent Neural 
Network 


Output at 
depends on last 
two outputs 


Output at 
independent 
of previous 
outputs 


Info from Info from 
input - ‘t-1’ input - ‘t’ 
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RNNs were instrumental in the 
development of the modern revolution in 
Machine Learning - Generative Al, Large 
Language Models, and Transformers 
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Artificial Intelligence 
The theory and methods to build machines 


Expert System Al that think and act like humans. 
Programmers teach Al 
exactly how to solve specific 
problems by providing 
precise instructions and 


steps. 


M Al bar EÅurstian WYD 


aiforeducation.io 
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From Memory 
to Attention! 


From Memory 
to Attention! 


When humans listen to something, we do 
not just memorize (short-term) what’s 


being said, but also focus on the 
important bits and extract mutual 
relevance of the words being uttered. 


ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt 
@ GIKI - FES 


95 


From Memory 
to Attention! 


When humans listen to something, we do 
not just memorize (short-term) what’s 
being said, but also focus on the 
important bits and extract mutual 
relevance of the words being uttered. 


il S Su ca? IO wile po Soe oS Tou 
oo | 
"OO sip Solu ga 5 Fowl yt KS les" 
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From Memory 
to Attention! 


- RNNs favor more recent 


information contained in words 
at the end of a sentence, while 
Information earlier in the 
sentence tends to be given less 
consideration. 
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to Attention! 
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information contained in words 
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- Attention determines the 
relative importance of each 
word in a sentence relative to 
the other words in that 
sentence. 


From Memory 
to Attention! 


- RNNs favor more recent 
information contained in words 
at the end of a sentence, while 
Information earlier in the 
sentence tends to be given less 
consideration. 


- Attention determines the 
relative importance of each 
word in a sentence relative to 
the other words in that 
sentence. 


- Application not limited to 
language (sentences and 
words). Can be any time 
series (Sequence and 
components) 


From Memory 
to Atte N ti O N Attention Is All You Need 
RNNs favor more recent 


information contained in words OY 


att h e en d of a sentence , W hil e V Ashish Vaswani* Noam Shazeer* Niki Parmar* Jakob Uszkoreit* 
information earlier in the Ne Google Brain Google Brain Google Research Google Research 
. swani@google.com noam@google.com nikip@google.com usz@google.com 
sentence tends to be given less 
consideration. Llion Jones* Aidan N. Gomez* | Łukasz Kaiser* 
Google Research University of Toronto Google Brain 


llion@google.com aidan@cs.toronto.edu lukaszkaiser@google.com 


Attention determines the 

relative importance of each Illia Polosukhin* À 
word in a sentence relative to ER ri 
the other words in that 

sentence. 


- Application not limited to 
language (sentences and 
words). Can be any time 
series (sequence and ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt 100 
components) Spe ee 


“Transformer” 
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“Transformer” 
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44 7 | | 
Tra N Sfo rm e a rhost modern Deep Learning Architecture 


hat implements “attention” mechanism 
efficiently. 


44 7 | | 
Tra N Sfo rm e MA rhost modern Deep Learning Architecture 


hat implements “attention” mechanism 
efficiently 
Output Probabilities 


Softmax 
Output 
Encoder processes a 


the input sequence, 
breaking it down 
into meaningful 
representations. 


N 
| Feed 
Forward 


Decoder takes these 
ea \ REE representations and 
generates the 


Encoder output sequence, 
like a translation ora 


Self Attention i å 
text continuation. 


Positional 4 
Encodings capture b 9 


the location of each - 
token in the eee CRÉÉ 
Embedding Embedding 


| Transformer Architecture 


Input Output (Simplified) 
(shifted right) 


sequence 
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“Large Language Model” 


44 


r— 


arge Language Model" 


What are LLMs? 


Large language models (LLMs) are a category of foundation 
models trained on immense amounts of data making them 
capable of understanding and generating natural language 
and other types of content to perform a wide range of tasks. 
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Large Language Model” 


Ge 
What are LLMs? D. 
re-trained 


Large language models (LLMs) are a category of foundation 
models trained on immense amounts of data making them 
capable of understanding and generating natural language 
and other types of content to perform a wide range of tasks. 


| ransformer 
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Some More General Terms You Must Know... 
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Learning Parameters vs Hyper-Parameters... 


Learning Parameters vs Hyper-Parameters... 


Parameters that an ML 
algorithm updates (“learns”) 
during training. E.g., weights 


in a Neural Network. 


Initial Weight (wou) 


Lees (9) Learning rate (a) 


New Weight (wnes) 


Weight (W) 


Minimum point of cost function 
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Learning Parameters vs Hyper-Parameters... 


Parameters that an ML SJ 
algorithm updates (“learns”) _ 
during training. E.g., weights Wnew = Wold =a 
in a Neural Network. Ow 


Initial Weight (wou) 


ti Learning rate (a) 


New Weight (wnes) 


Weight (W) 


Minimum point of cost function 
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Learning Parameters vs Hyper-Parameters... 


Parameters that an ML SJ 
algorithm updates (“learns”) _ 
during training. E.g., weights Wnew = Wold - am 
in a Neural Network. Ow 


Initial Weight (wou) 


ti Learning rate (a) 


New Weight (wnes) 


Weight (W) 


Minimum point of cost function 
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Learning Parameters vs Hyper-Parameters... 


Parameters that are set by the 
user. E.g., learning rate in 
Gradient Descent. 


Parameters that an ML SJ 
algorithm updates (“learns”) _ 
during training. E.g., weights Wnew = Wold - am 
in a Neural Network. Ow 


Initial Weight (wou) 


Leo Learning rate (a) 


New Weight (wnes) 


Weight (W) 


Minimum point of cost function 
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Training Data vs. Validation Data vs. Test Data 


Training Data vs. Validation Data vs. Test Data 


Used to tweak 
“hyper- 
parameters” 
ÆR | (e.g., learning 


rate) 
Train model Evaluate model 
on Training Set on Validation Set 


Tweak model according 


to results on Validation Set 


' i 
I i 
I i 
I i 
I i 
1 I 
I l 
I l 
I l 
l I 


Pick model that does Confirm results 
best on Validation Set on Test Set 


V7 Labs 
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Overfitting vs. Underfitting 


Overfitting vs. Underfitting 


Overfitting Right Fit Underfitting 


HW: Why is 
overfitting bad? 


Classification 


Regression 
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Overfitting vs. Underfiy 


Not Recognizing Dad After 
Haircut 
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Backpropagation [of Error] 


Backpropagation [of Error] 


During each iteration of training, an NN checks how 
wrong its final output was, and propagates this 
information backwards to all neurons in all layers so 
they can try to adjust themselves towards lesser 
error. 


Backpropagation [of Error] 


During each iteration of training, an NN checks how 
wrong its final output was, and propagates this 
information backwards to all neurons in all layers so 
they can try to adjust themselves towards lesser 


Arranar 
Error is sent back to 
each neuron in backward 
Gradient of error is directi 
irection 
calculated with respect to 
each weight 


Outputs Error - difference 
—— > Error > between predicted 
Predicted output and actual 
output output 
Input Layer Hidden Layer Output Layer 


ES691 - Mathematics for Machine Learning / Dr. Naveed R. 121 
Butt @ GIKI - FES 


Questions?? Thoughts? ? 


D 


å 


À}? 


ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 


